SUTAV: A Turkish Audio-Visual Database

نویسندگان

  • Ibrahim Saygin Topkaya
  • Hakan Erdogan
چکیده

This paper contains information about the “Sabanci University Turkish Audio-Visual (SUTAV)” database. The main aim of collecting SUTAV database was to obtain a large audio-visual collection of spoken words, numbers and sentences in Turkish language. The database was collected between 2006 and 2010 during “Novel approaches in audio-visual speech recognition” project which is funded by The Scientific and Technological Research Council of Turkey (TUBITAK). First part of the database contains a large corpus of Turkish language and contains standart quality videos. The second part is relatively small compared to the first one and contains recordings of spoken digits in high quality videos. Although the main aim to collect SUTAV database was to obtain a database for audio-visual speech recognition applications, it also contains useful data that can be used in other kinds of multimodal research like biometric security and person verification. The paper presents information about the data collection process and the the spoken content. It also contains a sample evaluation protocol and recognition results that are obtained with a small portion of the database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Driven MPEG-4 Facial Animation for Turkish

In this study, a system, that generates visual speech by synthesizing 3D face points, has been implemented. The synthesized face points drive MPEG-4 facial animation. To produce realistic and natural speech animation, a codebook based technique, which is trained with audio-visual data from a speaker, was employed. An audio-visual speech database was created using a 3D facial motion capture syst...

متن کامل

Comparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts

: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...

متن کامل

Speaker-independent 3D face synthesis driven by speech and text

In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...

متن کامل

Labeling audio-visual speech corpora and training an ANN/HMM audio-visual speech recognition system

We present a method to label an audio-visual database and to setup a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. The multi-stage labeling process is presented on a new audiovisual database recorded at the Institute de la Communication Parlée (ICP). The database was generated via transposition of the audio databas...

متن کامل

Cross database training of audio-visual hidden Markov models for phone recognition

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012